Towards Facilitating the Accessibility of Web 2.0 Texts through Text Normalisation

نویسندگان

  • Alejandro Mosquera
  • Elena Lloret
  • Paloma Moreda
چکیده

The Web 2.0, through its different platforms, such as blogs, social networks, microblogs, or forums allows users to freely write content on the Internet, with the purpose to provide, share and use information. However, the non-standard features of the language used in Web 2.0 publications can make social media content less accessible than traditional texts. For this reason we propose TENOR, a multilingual lexical approach for normalising Web 2.0 texts. Given a noisy sentence either in Spanish or English, our aim is to transform it into its canonical form, so that it can be easily understood by any person or text simplification tools. Our experimental results show that TENOR is an adequate tool for this task, facilitating text simplification with current NLP tools when required and also making Web 2.0 texts more accessible to people unfamiliar with these text types.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Evaluating the Contribution of Text Normalisation Techniques to Sentiment Analysis on Informal Web 2.0 Texts∗ Evaluación de la Contribución de la Normalización al Análisis de Sentimiento en Textos Informales de la Web 2.0

The writing style used in social media usually contains informal elements that can lower the performance of Natural Language Processing applications. For this reason, text normalisation techniques have drawn a lot of attention recently when dealing with informal content. However, not all the texts present the same level of informality and may not require additional pre-processing steps. Therefo...

متن کامل

Improving Web 2.0 Opinion Mining Systems Using Text Normalisation Techniques

A basic task in opinion mining deals with determining the overall polarity orientation of a document about some topic. This has several applications such as detecting consumer opinions in on-line product reviews or increasing the effectiveness of social media marketing campaigns. However, the informal features of Web 2.0 texts can affect the performance of automated opinion mining tools. These ...

متن کامل

DLSI en Tweet-Norm 2013: Normalización de Tweets en Español

The lexical richness and its ease of access to large volumes of information converts the Web 2.0 into an important resource for Natural Language Processing. Nevertheless, the frequent presence of non-normative linguistic phenomena that can make any automatic processing challenging. In this paper is described the participation in the Text Normalisation Workshop at the SEPLN conference (Tweet-nor...

متن کامل

Access Toolkit for Education

This paper describes three tools that have been developed to help overcome accessibility, usability and productivity issues identified by disabled students. The Web2Access website allows users to test any Web 2.0 site or software application against a series of checks linked to the WCAG 2.0 and other guidelines. The Access Tools accessible menu helps with navigation to portable pen drive applic...

متن کامل

Reading Performance of Iranian EFL Learners in Paper and Digital texts

Dependence on computers and internet has given birth to digital literacy. However, research into its influences on the reading process is still in its infancy. To fill the gap, this study was designed to investigate the ways in which text presentation mode (paper vs. digital) affects reading comprehension, as well as reading attitudes. To this end, a sample of 30 male and female English major s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012